SQL Server 2008 : Performance Tuning - Partitioning

12/7/2010 7:40:39 PM

Organizations collect more data and retain data for longer than ever before. The phenomenal growth of the storage manufacturing industry over the past 10 years is a testament to continually increasing data collection. Given adequate capacity, storing large quantities of data within SQL Server is no big problem, until we need to retrieve some data or perform any maintenance. New challenges arise from retrieving single rows or range searches of multiterabyte databases, while maintaining good response times.

Partitioning was first available in SQL Server 7.0, although in different versions this application logic was required to determine the partition holding a specific row. In SQL Server 2000 it was possible to define a view that unified the data and in SQL Server 2005 table partitions were completely transparent to applications. SQL Server 2008 Enterprise edition provides the next generation of table and index partitioning, which introduces a round-robin thread model to satisfy queries accessing multiple partitions. Additionally, SQL Server 2008 includes new level of lock escalation, which means locks can escalate from row or page locks to partition locks. This differs from SQL Server 2005, where row or page locks could be escalated directly to table locks.

Horizontal Partitioning

Horizontal Partitioning involves dividing a large table into a number of smaller tables, each containing all columns for a subset of rows. Dividing rows into separate tables means each table is much smaller and access times are typically more efficient, and therefore faster. Partitioning maintains data integrity, and it is possible to partition based on any column; however date ranges are most common for partitioning. This allows administrators to separate recent (usually more active) data from older archive data, improving the performance of data access to the frequently accessed data (usually recent rows).

Vertical Partitioning

This method splits a large table into a number of smaller tables; each table contains every row, but a subset of all columns. The database normalization process provides vertical partitioning by removing any attributes not dependent on the primary key, joining these with a primary key/ foreign key constraint. Consider this method with caution, since retrieving all columns for a given row will require a join between the tables, which could be expensive in terms of performance.

Filegroups

Successful implementations of table partitioning improve availability and performance of the entire table since many operations can be performed in parallel. Availability can be improved because backup and restore operations can be performed on an individual filegroup. Additionally performance can be improved because CHECKDB and regular scan and seek operations can be performed in parallel when partitions are implemented on their own filegroups.

It’s common for SQL Server databases to grow to hundreds of gigabytes, and multiterabyte databases are no longer unusual. It’s likely that the fastest disk storage and backup target full backups will take many hours to complete. Filegroups help by allowing partial backups, or by allowing backups to run in parallel. Every database has a primary filegroup, and if you’re going to create multiple data files it’s recommended to create these on a secondary filegroup.

Partitions can operate within a single filegroup, however it is recommended that to gain the full benefit of table partitioning, each partition reside on its own file-group. This approach provides the benefit that each filegroup (therefore partition) can be stored on a different disk, meaning there are lots of benefits in I/O throughput and therefore database performance. The following example creates a new database with a primary filegroup (required) and four further filegroups, each with one data file:

CREATE DATABASE Orders

Create FILEGROUP OrdersGroup1

(NAME = OrdersGrp1Fi1_dat,

      FILENAME = 'F:\MDF\OrdersG1Fi1_dat.ndf',

      SIZE = 5120MB,

    FILEGROWTH = 1024MB),

FILEGROUP OrdersGroup2

(NAME = OrdersGrp2Fi1_dat,

    FILENAME = 'G:\MDF\OrdersG2Fi1_dat.ndf',

    SIZE = 5120MB,

    FILEGROWTH = 1024MB),

FILEGROUP OrdersGroup3

(NAME = OrdersGrp3Fi1_dat,

    FILENAME = 'H:\MDF\OrdersG3Fi1_dat.ndf',

    SIZE = 5120MB,

    FILEGROWTH = 1024MB),

FILEGROUP OrdersGroup4

(NAME = OrdersGrp4Fi1_dat,

    FILENAME = 'I:\MDF\OrdersG4Fi1_dat.ndf',

    SIZE = 5120MB,

    FILEGROWTH = 1024MB)

LOG ON

(NAME = Orders_log,

    FILENAME = 'E:\LDF\Orders_log.ldf',

    SIZE = 5MB,

    FILEGROWTH = 1MB)

GO

The concepts and goals of partitioning are fairly easy to grasp; the hardest part is understanding the language around the implementation of table partitioning. Essentially, there are two concepts introduced by a comprehensive wizard and accompanied by corresponding T-SQL commands.

Selecting a Partition Key and Number of Partitions

Any column can be used as a partition key and this column is the logical division between partitions. The partition function implements the partition key, but not the data placement on disk. It’s important to know the data and data access patterns since this knowledge will help select the partition and number of partitions. If there are logical boundaries or grouping to data, use these as the partition key; for example,if orders are typically queried by calendar months or fiscal quarters, these could be a natural choice as a partition key.

Partition Function

The partition function is used to map rows to partitions based on the partition key. The partition function specifies the boundary between each of the partitions. LEFT or RIGHT is used to determine in which partition the boundary value resides; LEFT is default. The following example creates a partition function that partitions orders based on order date:

CREATE PARTITION FUNCTION [pf_Orders](datetime) AS RANGE LEFT FOR
VALUES(N'2002-01-01T00:00:00',

N'2003-01-01T00:00:00',

N'2004-01-01T00:00:00')

This statement will create four partitions (despite only three values being listed). The boundary values are shown as “equal to or less than” since the partition is created with RANGE LEFT. The data is split between the partitions as follows:

Partition Number	Partition 1	Partition 2	Partition 3	Partition 4
RANGE LEFT	<= 2002-01-01	> 2002-01-01 AND <= 2003-01-01	2003-01-01 AND <= 2004-01-01	> 2004-01-01

Partition Scheme

The placement of data is determined by the partition function. The partition scheme controls mapping between partitions and filegroups. Performance gain can often be realized by using a 1-1 mapping between partitions and filegroups, and further by placing each filegroup on its own logical disk.

The following example creates a partition scheme, mapping each partition to its own filegroup:

CREATE PARTITION SCHEME [ps_Orders] AS PARTITION [pf_Orders]

TO ([fgSalesOrders1], [fgSalesOrders2], [fgSalesOrders3], [fgSalesOrders4])

The partition scheme definition will map the partitions to filegroups as shown in Table 1.

Table 1. Sample Partition Scheme
Partition Number	Partition 1	Partition 2	Partition 3	Partition 4
RANGE LEFT	<= 2002-01-01	> 2002-01-01 AND <= 2003-01-01	2003-01-01 AND <= 2004-01-01	> 2004-01-01
Filegroup	fgSalesOrders1	fgSalesOrders2	fgSalesOrders3	fgSalesOrders4

Moving Data between Partitions

Partitioning provides performance and manageability benefits and migrating to a partitioned table is relatively pain free. There are three options for moving data between partitions:

Split Partition
Merge partition
Switch Partition

The Partition splitting is implemented with the partition function and alters the boundary between partitions to divide an existing partition. A split commonly is used when adding a new partition at the end of an existing range. Merge partition again is administered with the partition function and can be used to combine two partitions. Both merge and split use the ALTER PARTITION FUNCTION TSQL command:

ALTER PARTITION FUNCTION pQuantity()

SPLIT RANGE(500)

Finally, probably the most useful function is SWITCH partition, which is useful when moving a complete partition. Be aware that ALTER TABLE... SWITCH is considered schema modification, therefore requires schema modification (Sch-M) lock on the table. Using the SWITCH functionality, it’s possible to implement a sliding window, whereby partitioning automatically manages the partitions. Here’s an example of the sliding window:

Partition 1 – Current week
Partition 2 – Previous 2 weeks
Partition 3 – Previous 3 months
Partition 4 – Previous 3 to 6 months
Partition 5 - Everything older than 6 months

In this scenario the partitioning functionality will provide automatic management for partitions and optimal performance for recent data. Older data is still available, however data retrieval times will likely be longer since there are large indexes.